[DAY15] 使用 Keras 拼出手寫數字辨識系統的前置作業2－MNIST 資料集

2022 iThome 鐵人賽

DAY 15

AI & Data

機器學習的 hello world - 用手寫數字辨識系統學習 ML 的 30 天系列第 15 篇

14th鐵人賽

sysherry

2022-09-29 12:48:15

2700 瀏覽

分享至

接續 Day14 提到的準備一些實作手寫數字辨識系統需要的前置作業/材料第三部分！

這一篇提到的程式碼可以看這 → DAY15-MNIST

三、MNIST 手寫數字資料集

MNIST 資料集是什麼呢？它一個小型的手寫數字資料庫，內含 60000 筆訓練圖像和 10000 筆測試圖像(image)，每一張圖片都經過前處理(preproces)及格式化(format)成 28*28 大小。

我們可以從 Keras 下載 MNIST 資料集(build-in small dataset)[註1]，現在就讓我們來看一下這個資料集的資料格式與型態吧。

1. import MNIST 資料集並看一下它的格式

from tensorflow import keras
(train_image, train_label), (test_image, test_label) = keras.datasets.mnist.load_data()

print("train image dataset =", train_image.shape)
print("train label dataset =",train_label.shape)
print("test image dataset =",test_image.shape)
print("test label dataset =",test_label.shape)

我們可以看到，訓練圖片的資料集是三維的，60000 張圖片，每張圖片大小是 28x28

接著把影像畫出來看一下

import matplotlib.pyplot as plt

plt.figure(figsize=(14,14)) #設定圖片呈現大小

for i in range(0,10):  
  ax=plt.subplot(5,5,1+i)
  ax.imshow(train_image[i])
  title= "label=" +str(train_label[i])
  ax.set_title(title, fontsize=14)  
plt.tight_layout()       
plt.show()